Skip to content

fix: Set CUDA arch list for UCCL EP build to SM90+#1808

Open
thomasdhc wants to merge 2 commits intomainfrom
donghyukc/uccl_ep_fix
Open

fix: Set CUDA arch list for UCCL EP build to SM90+#1808
thomasdhc wants to merge 2 commits intomainfrom
donghyukc/uccl_ep_fix

Conversation

@thomasdhc
Copy link
Copy Markdown
Contributor

What does this PR do ?

UCCL EP build fails with PTX errors because setup.py compiles for all CUDA architectures including SM75, but the
code uses SM90+ features (cp.async.bulk, .bulk_group). Sets TORCH_CUDA_ARCH_LIST="9.0 10.0 12.0" to match the
DeepEP build configuration.

Changelog

  • Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant